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Abstract — 

In this paper, we propose a singular value decomposition (SVD) algorithm with superlinear- 
convergence rate, which is suitable for the beamforming mechanism in MIMO-OFDM channels 
with short coherent time, or short training sequence. The proposed superlinear-convergence 
SVD (SL-SVD) algorithm has the following features: 1) superlinear-convergence rate; 2) the 
ability of being extended smaller numbers of transmit and receive antennas; 3) being insensitive 
to dynamic range problems during the iterative process in hardware implementations; and 4) 
low computational cost. We verify the proposed design by using the VLSI implementation with 
CMOS 90 nm technology. The post-layout result of the design has the feature of 0.48core area 
and 18mW power consumption. Our design can achieve 7 M channel-matrices/ s, and can be 
extended to deal with different transmit and receive antenna sets 

Key wards — Beamforming, multiple-input multiple-output(MIMO)-orthogonal frequency 
division multiplexing (OFDM),precoding, singular value decomposition (SVD), superlinear. 
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I. INTRODUCTION 

The demand of high-throughput wireless transmissions, such as IEEE 802.1 In WLAN 
systems and IEEE 802. 16e WiMAX systems, continues to grow. The antenna arrays at both 
transmitter and receiver construct multiple-input multiple-output (MIMO) transceivers to 
enhance the data throughput significantly. In MIMO orthogonal frequency division multiplexing 
(MIMO-OFDM) wireless systems, the data streams can be demultiplexed into several substreams 
transmitted by different antennas to improve the bit-error-rate(BER) or throughput performance 
of the overall communication system by utilizing the transmit diversity. 

The singular value decomposition (SVD) of the channel matrix in MIMO-OFDM system is 
proved to be able to derive the singular vector matrix for optimum linear precoding and linear 
receivers. [1]. In modern MIMO-OFDM communication systems with high-throughput 
requirement, such as IEEE 802.11n,the time interval of sending the precoding matrix to the 
transmitter is specified [2]. 

In other words, the time for the SVD of one complex matrix is limited to about 400 ns. When the 
channels have short coherent time, the information derived by SVD should be sent from the 
receiver to the transmitter as soon as possible to keep the beamforming performance. The 
decomposing time and accuracy will there fore greatly affect the beamforming performance. 

The right singular vector matrix derived from the SVD results of the channel matrix is the 
optimal precoding matrix for linear detectors such as zero-forcing (ZF) and minimum mean 
square error (MMSE) detectors [3]. There have been researches about the SVD algorithms for 
MIMO-OFDM applications. Traditional power iterative algorithm [4] can also be used to solve 
the SVD problem. However, it has only linear-convergence rate.lt will be much slower when the 
channel matrix has multiple similar singular values. An algorithm of updating the singular 
vectors of the channel matrix by periodic pre- and post-multiplication by Jacobi rotation matrices 
was proposed in [5] with high computational cost. In [6], the authors proposed an adap tive SVD 
algorithm with practical hardware implementations in [7] for MIMO applications without 
channel state information (CSI). Nevertheless, their convergence time requires hundreds of 
samples per channel matrix. The disadvantage of long convergence time is not suitable for 
MIMO channels with short coherent time or short training sequence. Another adaptive SVD 
beam forming algorithm with perturbation theory was also proposed in [8]. Nevertheless, the 
computational cost is also high. The algorithm in [8] with iterative division will apparently cause 



A Monthly Double-Blind Peer Reviewed Refereed Open Access International e-Journal - Included in the International Serial Directories 
Indexed & Listed at: Ulrich's Periodicals Directory ©, U.S.A., gJJ PfleM»M as well as in Cabell's Directories of Publishing Opportunities, U.S.A. 



International Journal of Management, IT and Engineering 

http://www.ijmra.us 



rsifl 



June 

2013 




ISSN: 2249-0558 



performance degradation in practical hardware implementations with severe quantization effect. 
In [9], a hardware efficient SVD algorithm VLSI architecture for steering matrix computation 
was proposed. It utilizes bidiagonalization, diagonalization, and Givens rotation to achieve high 
processing throughput.The resulting VLSI implements with 0.18micro m technology requires 3.3 
microsecond to complete the SVD of one complex matrix, which is still more than 8 times the 
critical requirement (i.e., 400 ns) in IEEE 802.1 In systems. In addition, the algorithms mentioned 
above have only linear convergence speeds. Hence, these algorithms may not be suitable for the 
MIMO channels with short coherent time, or short response time requirement with the 
specification in the MIMO OFDM systems. 



In this work, we propose a superlinear-convergence SVD (SL-SVD) algorithm and architecture 
with four features. 1) The property of superlinear-convergence rate makes it at least 25 times 
faster than the referenced works. 2) The ability of being extended to smaller numbers of transmit 
and receive antennas without hardware overhead. 3) The proposed SL-SVD is in sensitive to the 
dynamic range problems during the iterative process. Only 10-bit precision is required with the 
system simulation in the IEEE 802.1 In systems. It leads to small area, short critical path, and 
over five times better normalized area efficiency in VLSI implementations compared with related 
works. 4) The comparison of the computational cost inSection IV shows the proposed SL-SVD 
to have at least 25%complexity reduction compared with other algorithms of [7]and [9]. At last, 
we implement the hardware of the SL-SVDbeamforming algorithm in 90 nm technology. The 
chip has thefeature of 0.48core area and 18 mW power consumption.lt not only achieves 7 M 
channel-matrices/s, 140 ns per matrixequivalently, which satisfies the critical specification of 
400ns per matrix in the IEEE 802.1 In systems. In addition, theproposed SL-SVD design is also 
able to be extended to dealwith different transmit and receive antenna sets. Besides, 
thepostlayout simulation is also verified by commercial electricdesign automation (EDA) 
tools.The paper is organized as follows. The system model is described in Section II, and the 
details of the operation of the proposed SL-SVD algorithm are presented in Section III.. The 
simulation, architecture design, and VLSI implementation results are presentedin Sections V. 
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II. SYSTEM MODEL 

Consider a wireless MIMO-OFDM system in a frequencynonselective, slowly fading channel, 
respectively. Suppose N r and N t antennas are used at the transmitter and receiver. Theequivalent 
channel model is given by 



r=Hs+n 



(i) 



H € CN r x N t is the complex channel matrix with the(p,q)th element which is the random fading 



between the p th receive and q th transmit antennas n € 



CN x 1 



.is the additive noise source 



and is modeled as a zero-mean, circularly symmetric, complex Gaussian random vector with 
statistically independent elements. 

The p th element of S € CN r x 1 is the symbol trans mitted at the p th transmit antenna, 



and that of r € 



CN x 1 



is the symbol received at the p th received antenna. 



After deriving the CSI, we can decompose the channel matrix 



in the SVD form as follows: 



H=U£V H 
r 



1= 



a 



a 



o 







(2) 



•C7|>G 2 >C7,>0 (3) 



N r xN t 



U=[ui, u 2: 
V=[vi, w 2 - 



U N r] 
V Nt -] 



(4) 
(5) 



U is an an Nr X Nt unitary matrix, V is an Nt X Nr unitary matrix,and.t=min(Nr,Nt).Ui's and 
Vi's are the corresponding left and right singular vectors. X is an Nt X Nr matrix with only 
nonnegative main diagonal entries which are the nonnegative square roots of the eigenvalues of 
H H H, and (.) H denotes the Hermitian operation. In (2), the diagonal matrix X is unique for a given 
channel matrix while the unitary matrices U and V are not unique matrices. By substituting the 
SVD results for the matrix H , (1) becomes 
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r=UXV H s +n 



(6) 



H 



Multiplying U 



on both sides, (6) can be rewritten as 



r =^s +n 



(7) 



H H ' H 

where r-U r and s-V s,and n=U n distribution of n' , is invariant under unitary 
transformation. It means that the multiplication of the AWGN by a unitary matrix does not cause 
any noise enhancement. The multiantenna channel is equivalent to min( Nt,Nr) independent 
parallel Gaussian subchannels at most. Each subchannel has a gain, which is the singular value 
of the channel matrix H. 



III. THE PROPOSED SL-SVD ALGORITHM 

Our goal is to develop an iterative SVD algorithm with high convergence rate with 
acceptable computational cost. Most it erative SVD algorithms try to reduce the computational 
cost in each iteration, however the number of required iteration times is enlarged. If we can 
greatly reduce the entire computation time by increasing moderate computational cost in each 
iteration, the overall computational cost which can be lowered with even higher convergence 
rate. The proposed SL-SVD has the property of superlinear-convergence rate and the detailed pro 



cedures are described in the following subsections. 



A. Initial Stage and Iterative Process 



To handle MIMO-OFDM channels with short coherent time or short training sequence, 
we propose a superlinear-conver- gence SVD (SL-SVD) algorithm for closed-loop beamforming. 
From (2), the results of the SVD process consist of singular values and singular vectors. The 
main idea of the proposed SL-SVD algorithm is to derive the singular vectors prior to singular 
values. Deriving singular vectors first has a significant advantage that we only have to care about 
the direction of the singular vector but not the norm. 

In the proposed SL-SVD algorithm, we do not compute the decomposition directly. Instead, 
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we derive the direction of the right singular vectors by iterative computation. The con vergence 
rate is enhanced by using the matrix multiplicationsiteratively. At the same time, we apply the 
proposed adaptive binary shift mechanism to prevent the growth of the dynamic range during the 
iterative multiplication. Unlike the traditional power iteration method [4], adaptive method [8] 
and [6], this work provides higher convergence rate of deriving the results of SVD, and needs 
only 10-bit precision for the variables during the iterative computation in our simulations in 
Section V. 



To simplify the SVD problem from three unknown matrices, U,X> an d V , to two unknown 
matrices, we firstly define the initial matrix Pi(0) 

Pi (0) =k 1; o. H^K^oVlV^K^o^t v iVi H (8) 

i=l 

Pi(n)and Ki ?n the updating matrix and arbitrary non zero coefficients after the n th iteration of 
the proposed algorithm for deriving the th singular vector Vi . The value of the maximum 
iteration number,n , can be defined in advance. We only have to solve two unknown matrices,^ 
and v 



b. Deflation 



After vDi is found out, the correlated components vDi of in Pi ; should be eliminated for 
deriving the next estimated singular vector VCb • The singular vectors VDi 's have two 
properties as fol lows: 



Summation property : £i=i tvDiVi =lNt (9) 
And 

Orthogonal property : v □ i H Vj=0, A i ^ j (10) 
where is an N t x N t identity matrix 



C. Left Singular Vector and Singular Value Matrix Derivation 
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After the matrix V □ is derived ,we multiply the channel matrix H with V □ 



T=HVD=[Hvi HvD 2 HvD Nt ] 



=(ux v H ) v □ =u □ x □ =uz 



(ii) 



Equivalently ,the column vector of T,Hv, can be obtained after deriiving each vf.The estimated 
singular values and left singular n be derived as 



Oi(: , i)=|| T(: , i)|| 2 



UDi = UD(:,i) =[T(:,i)]/[||T(:,i)|| 2 ] 



(12) 



Where UD and are the estimated matrix of singular values and the estimated left singular 
vectors respectively. . By computing the norm of each column in H. Vand normalizing the 
column vectors, we then derive ^0 and UD without any iterative multiplication. 

Note that the main computations and storage needed are related to the matrix VD ,and only 
small word length required in the iterative multiplication due to the proposed adaptive binary 
shift mechanism so as to reduce the critical path and the hard ware needed at the same time. The 
computation of and UD requires no iterative process and is outside the loop, which indicates 
we can use greater wordlength to store the values of an d UD for higher overall accuracy 
without increasing much hard ware overhead or lengthening the critical path. 



D. Orthogonality Reconstruction (OR) 

In practical hardware implementations, all the elements will be expressed in finite precision. 
The orthogonal property among singular vectors, column vectors of U and V , will be corrupted 
and induce the interferences among transmitted substreams. We will then propose an operation 
called OR to preserve the most orthogonality. Applying 
SVD to the channel matrix H , we can learn that 
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X=UHHV 



(13) 



The corruption property among singular vectors will cause nonzero value of off-diagonal 
entry of diagonal matrix^ • Such nonzero off-diagonal entries will cause interfereence among 
each antenna and inaccurate singular values which bring BER degradation. The corruption of 
orthogonal property among singular vectors should be carefully handled. However,in fixed point 
design, this property is corrupted by quantization error and inaccurate deflation due to the finite 
precision. Especially, error propagation induced by deflation stage may cause a fatal error to 
orthogonal property among singular vectors. Take two singular vectors as an example 



H 



VD n VDj=e ,vfi 



(14) 



where VDi and VDj are tw o orthogonal s ingular vectors. If VDi and VDj have perfect 
orthogonal property, € should be equal to zero. If the orthogonal property of VDi and VDj are 
destroyed by quantization error, the value of is near to the accuracy which fixed point can 
represent. However, error propagation induced by deflation stage may lead € become hundreds 
times of system accuracy.The destruction of ort ogonal property among singular vectors caused 
by quantization error may not be prevented. However, we can use orthogonality reconstruction 
for fixed point opera tion to eliminate the destruction caused by deflation stage and improve the 

performance. ^^^^^^^^^^^^W^K Jp m 

For orthogonality reconstruction, first we consider the data flow in Fig. 1. Notice that VDi 
corresponding to the greatest singular value does not suffer from the errors caused by the 
deflation operation. While Ui's, for all i>l eliminate the inaccurate remaining part on previously 
derived singular vectors by applying Gram-Schmidt process respect to VDi~Vi_i. The operation 



can be expressed as 

qDi=vDi 



(15) 



i-i 



qDi=vDi=X (vD H oc,kvDi )vD c,k , fori>=2, 



(16) 



vDoc,i=qni/(n qDill 2) 



(17) 
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After applying orthogonality reconstruction to all column vectors of V , the most 
interferences caused by inaccurate deflation process can be avoided. In most cases 



□ qDill 2 □ 1 

E. Algorithm Flow and the Architecture 



(18) 



Fig. 1 shows the flow chart of the proposed SL-SVD algorithm in this paper. The detailed 
steps will be listed as follows, 



Giv-nn channd matrix H 



Let the initial matrix of the fr-th 
singular vector p^ = H r n 



Step 2 



I 



Matrix multiplication 



pf H) = 



p(H-l) p(lt-ll 



J- I^--^-----^^^---^--^ ---.---^-. 1-^ 



Adaptive binary shift 



Step 3 



p, i:n:i <= ■ p; 




Ss^p 5 Deflation 

C = Ok - MM" 




Orthogonal f^constmction 

i -a 



Singular vectors and 
singular values 

T = H V = fUEVHf * UE 

3f, t , = IT<: P Q| a . 



a, = ot r t) = 



HT(:,0I; 



Fig. 1 . The flowchart of the proposed 



superlinear-convergence SVD algorithm 



Step 1) Given the complex channel matrix H. 

step 2) Derive the updating matrix Pk(0) of the right singular vector corresponding K th 
singular value and perform the matrix multiplication. 

Step 3) Use adaptive binary shift to approach the desired singular vector under the constraint of 
wordlength precision. 

Step 4) Check if the set maximum iteration number (chosen to be 4 for the worst case of 4X 4 
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matrices according to the simulation results in Section V) is reached or not. Go back to Step 3 
if the condition is not satisfied, or else go to Step 5. 
Step 5) Perform the deflation operation. 

Step 6) Check if all singular vectors are solved or not. If not, go to Step 2, otherwise perform 
OR operation and go to Step 7. (For a 4 4 matrix, only 3 OR operation is 
Step 7) Derive the results of U, X > V and . 



IV. ARCHITECTURE DESIGN 



The overall architecture of the hardware design is shown in Fig. . It is mainly composed of 
four parts: 1) matrix array multiplication for iterative multiplication and deflation; 2) matrix- 
vector multiplier for orthogonal reconstruction^) pipelined vector normalization for deriving 
singular values and vectors; and 4) specific control circuits and storages of right singular vectors 
before or after OR operation. We can derive the desired singular values, left, and right singular 
vectors after the proposed iterative processing. In Fig. (a), the matrix-matrix multipliers are 
designed for the matrix multiplication. The inputs are two matrices and the output is an upper 
triangular matrix due to its Hermitian property so that the iterative multiplication cost can be 
reduced by half without performance degradation. For ancomplex are required in the matrix- 
matrix multiplication block. In addition to the iterative multiplication, the deflation operation can 
also be executed with these multipliers. The function of A.B.S is designed to solve the problem 
of the exponentially growing values in the matrix during iterative multiplication. As shown in 
(19), a delicate binary shift is applied to the whole matrix after each iteration according to the 
magnitudes of the diagonal elements. The A.B.S. block is simplifiedto be multiplexers and XOR 
gates only 

In Fig. (b), the matrix- vector multipliers can be utilizedin the phase of orthogonal 
reconstruction by Gram-Schmidt process and computation of. As described , two cycles are 
required to obtain the results of OR operation. In Fig. (c), the pipelined vector normalization can 
be decomposed to be: square of the vector 2-norm, inverse square root,square root, and vector 
scaling. The digit-by-digit calculation and digit recurrence algorithm in are adopted for 
implementing the square root and inverse square root operations,respectively. This block can be 
used to obtain the normalized left and right singular vectors. The singular values can also be 
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derived with the square root function. 



Iterative multiplication 
^n<j Deflation 
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The straightforward implementations of inverse square root and square root functions are 
applied in our design, and the equivalent gate counts are about 9 and 0.8 k, respectively. These 
two function blocks are hardware 

expensive and occupy about 6% area over the entire design. Although straight implementations 
for inverse square root and square root functions are employed in this work, the CORDIC 
operation is feasible to mitigate the cost of the square root function. The storages of left singular 
vectors before or after OR operation is shown in Fig. 10(d). With dedicate task arrangement, the 
storages of the right singular vectors can be outputted for OR operation orcomputation. The fine- 
tuned results of right singular vectors can also be stored after OR operation 



The postlayout analysis of the proposed SL-SVD engine is obtained by using Verilog HDL 
codes synthesized with the stan dard cell library of UMC 90 nm 1P9M Low-K process in a core 
size 0.48at 182-MHz operating frequency. The power consumption is evaluated with Synopsys 
Prime Power in 4X 4 antenna mode.. To meet the specification of IEEE 802.1 In standard, the 
proposed SL-SVD engine can support 16 antenna modes. For the application to EEEE 802.1 In 
standard, we use the SVD engine to serially decompose all the channel matrices all subcarriers. 
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Chip results show that the latency of our SL-SVD engine for 128 subcarrier MIMO-OFDM 
system is about 0.3% of WLAN coherence time [14] to prevent time-varying channel. 
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Fig. 2. (a) Matrix-matrix multiplication and A.B.S. (b) Matrix-vector multiplication, (c) 



Pipelined vector normalization, (d) Storages of right singular vectors 



The SVD of one complex channel matrix foiiwing to the super linear-convergence property of 
proposed SL-SVD. In successsive matrix processing, the equivalent processing time required for 
each matrix can even be reduced to 90 ns. The normalized area efficiency is five times better than 
the referenced works due to the properties of low computational cost and insensitivity to the 
dynamic range problem. The prototype design can be also extended to different antenna sets. 

We need only few numbers of iteration to complete SVD process due to the property of 
superlinear-convergence rate of the proposed SL-SVD. The SL-SVD is division-free and only 
multiplication operation is introduced in each iteration. The A.B.S. and orthogonality 
reconstruction (OR) are also utilized for updating and vector correction, so that we can use only 
10-bit precision in our design. That is why our design is area 
and power efficient 



V. SIMULATION RESULTS 

The validity of the proposed MIMO channel estimation algorithm is investigated via 
Matlab™ simulations. 





1200 



1200 



1500 2000 2500 3000 



Fig 1 : Transmission based signal generation and modulation approach where the signal is generated 
while carrier signal is obtained for modulation. 

Figure 1 and Figure 2 show the MSE simulation results for the transmitted signal and their 
response in complex format 
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Real part of Channel 



O Estimated 

H Ac c u rat e 




Fig 2: Channel estimation with accuracy as the result represent theoretical and estimated result .The 
signal is mapped with the expected output and graph is show the variation in the outcomes. 




Fig 3: Channel estimation using Quantization based approach algorithm and its Approximation 
result.The symbol rate defines the performance of the system that can be used to define efficiency 
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Fig 4: Channel Spectrum allocation in dynamic Mode Approximation result. The graph shows the 
spectrum allocation of the system with continuous signal generation mechanism. 
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Fig 8: Channel receiver Plot with defined Real coefficient result.The symbol prediction accuracy is 
specified along the Inphase signal and the vibrational result 

From the results shown in Figure 6, it is apparent that the signal transmitted as information is 
properly estimated and obtained. 



VLCONCLUSION 

In this paper, we propose a superlinear-convergence rate SVD algorithm. The algorithm 
can obtain the SVD results of the complex MIMO-OFDM channel matrices about 25 times faster 
than other referenced algorithms. The superlinear-convergence speed makes this algorithm 
suitable for the channels with short coherent time. Moreover, the SL-SVD engine can be 
extended to decompose theor smaller channel matrices with little hardware overhead. The total 
computational cost is low owing to the superlinear-convergence rate. A hardware impleentation 
with 90 nm technology is also presented. The chip has the feature of 0.48core area, 18 mW 
power consumption, being able to handling 7 M-channel-matrices/s, and can be extended to deal 
with different transmit and receive antenna sets. 
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